Novel machine‐learning prediction tools for overall survival of patients with chondrosarcoma: Based on recursive partitioning analysis

Abstract Background Chondrosarcoma (CHS), a bone malignancy, poses a significant challenge due to its heterogeneous nature and resistance to conventional treatments. There is a clear need for advanced prognostic instruments that can integrate multiple prognostic factors to deliver personalized survival predictions for individual patients. This study aimed to develop a novel prediction tool based on recursive partitioning analysis (RPA) to improve the estimation of overall survival for patients with CHS. Methods Data from the Surveillance, Epidemiology, and End Results (SEER) database were analyzed, including demographic, clinical, and treatment details of patients diagnosed between 2000 and 2018. Using C5.0 algorithm, decision trees were created to predict survival probabilities at 12, 24, 60, and 120 months. The performance of the models was assessed through confusion scatter plot, accuracy rate, receiver operator characteristic (ROC) curve, and area under ROC curve (AUC). Results The study identified tumor histology, surgery, age, visceral (brain/liver/lung) metastasis, chemotherapy, tumor grade, and sex as critical predictors. Decision trees revealed distinct patterns for survival prediction at each time point. The models showed high accuracy (82.40%–89.09% in training group, and 82.16%–88.74% in test group) and discriminatory power (AUC: 0.806–0.894 in training group, and 0.808–0.882 in test group) in both training and testing datasets. An interactive web‐based shiny APP (URL: https://yangxg1209.shinyapps.io/chondrosarcoma_survival_prediction/) was developed, simplifying the survival prediction process for clinicians. Conclusions This study successfully employed RPA to develop a user‐friendly tool for personalized survival predictions in CHS. The decision tree models demonstrated robust predictive capabilities, with the interactive application facilitating clinical decision‐making. Future prospective studies are recommended to validate these findings and further refine the predictive model.


| INTRODUCTION
3][4] In comparison to Ewing sarcoma and osteosarcoma, chondrosarcoma is considered a less aggressive form of cancer. 5It most commonly affects the pelvis and proximal femur, posing significant challenges due to its location and aggressive behavior. 6,7][10] This resistance is largely attributed to the rich extracellular matrix, limited vascularization, less cellular dividing, and sluggish growth characteristics of chondrosarcoma. 11he prognosis for patients with chondrosarcoma varies significantly and is largely contingent upon factors such as tumor grade, size, location, and other patient-specific characteristics.Therapeutic strategies diverge considerably across various tumor grades.Low-grade chondrosarcomas, with their lower risk of local recurrence and distant metastasis, are associated with increased overall survival and can often be effectively managed through curettage.Conversely, higher-grade tumors carry a greater likelihood of local recurrence and metastasis, leading to a reduced overall survival, necessitating more extensive surgical excision or even amputation in some cases. 12,13hile histological grading is a crucial determinant of patient survival, it is subjective and can vary among pathologists, underscoring the urgency for more objective and reliable indicators of overall survival.Identifying independent prognostic factors that influence survival through extensive patient cohort studies holds considerable significance for both patients and clinicians, offering insights into future prognoses, guiding the selection of optimal treatment strategies, and determining the extent of surgical intervention.][16][17][18][19] Among the identified prognostic factors, there is a clear need for advanced prognostic instruments that can integrate multiple variables to deliver personalized survival predictions for individual patients.
][22] In the context of chondrosarcoma, many prediction models have been proposed, primarily based on logistic regression or Cox proportional hazards regression.18,19 When utilizing these prediction models, users must manually score each predictor before summing the scores to estimate overall survival.
Recursive partitioning analysis (RPA) and decision tree models have emerged as attractive alternatives to traditional statistical approaches. 23RPA is a nonparametric method that recursively splits the patient population into subgroups based on optimal cutoff values.This results in a series of nested partitions or nodes, each representing a distinct patient subgroup with homogeneous outcomes.Decision trees, constructed using RPA, visually represent the relationships between predictors and outcomes, providing a straightforward and intuitive tool for clinical decision support. 24The RPA and decision tree models can accommodate complex interactions and produce easily interpretable models that are accessible to clinicians without specialized statistical training.Additionally, they can be readily implemented in software and web-based platforms, facilitating integration into clinical workflows and improving accessibility to a wide range of healthcare providers.The application of RPA and decision tree models in the field of chondrosarcoma is still in its infancy.
Therefore, by leveraging the strengths of RPA and decision tree modeling, we aimed to create a user-friendly tool that integrates multiple prognostic factors to generate personalized survival predictions for patients with chondrosarcoma.Furthermore, we planned to translate this model into an interactive software application and webbased platform, making it widely available to clinicians as a decision aid in selecting appropriate treatment strategies for their patients with chondrosarcoma.

| MATERIALS AND METHODS
This investigation adhered to the principles outlined in the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) statement to ensure the clarity and reproducibility of our research findings.

| Data source
The dataset for our analysis was sourced from the Surveillance, Epidemiology, and End Results (SEER) database, a nationwide cancer registry system administered by the National Cancer Institute (NCI) in the US Our study encompassed individuals diagnosed with chondrosarcoma, with pertinent data meticulously extracted from the registry.Key data items included patient demographics, clinical features, therapeutic interventions, and longitudinal survival information.

| Inclusion and exclusion criteria
Records retrieved from the SEER database had to meet the following inclusion criteria: (1) a definitive diagnosis of chondrosarcoma, as evidenced by International Classification of Diseases for Oncology, 3rd edition (ICD-O-3) codes; (2) diagnoses made within the period extending from 2000 to 2018; and (3) availability of comprehensive data regarding baseline characteristics, tumor histology, tumor grade, therapeutic modalities, and clear survival status at each time point (12, 24, 60, and 120 months).
Patients would be non-eligible for inclusion under the following conditions: (1) diagnosed predating the year 2000; (2) inaccuracies or ambiguities concerning the duration of survival; and (3) records flagged as censored during the follow-up interval, indicating incomplete survival data.The primary outcome in this cohort study was the patients' survival statuses (alive or dead) at 12, 24, 60, and 120 months.After removing the records censored during follow-up and those without comprehensive information about survival, a total of 3894, 3674, 3147, and 2409 patients were included for the four cohorts (as shown in the flowchart in Figure 1).

| Statistical analyses
Continuous variables such as age and tumor size were reported as mean ± standard deviation (SD), while other variables were expressed as percentages.Following the exclusion of censored records, four distinct cohorts were established.These cohorts were subsequently partitioned into training and test datasets with a randomized split ratio of 7:3, facilitated by a computational algorithm.For the purpose of predicting survival statuses at specific time points (12, 24, 60, and 120 months), dichotomous decision trees were crafted.This was achieved using the C5.0 algorithm, which is integrated within the SPSS Modeler version 18.0 software (IBM Corp., Armonk, New York, USA).A visual representation of the analytical workflow, encompassing stages from data importation to model training and validation, is depicted in Figure 2.
In order to assess the performances of the newly developed decision trees, we employed the confusion scatter matrix and ROC curves, from which we computed the accuracy rate (ACC) and the area under the ROC curve (AUC).Model validation processes were executed in both the training and test datasets.Subsequently, these models were adapted into an interactive software tool and a webbased platform through the utilization of the "shiny" package within R version 4.3.3(Foundation for Statistical Computing, Vienna, Austria).The above model validation process was carried out using R program, with a statistical significance threshold set at less than 0.05.

| Baseline characteristics
Consecutive cohorts of 3894, 3674, 3147, and 2409 patients with definitive survival status were included for the 12-, 24-, 60-, and 120-month analyses, respectively.Table 1 presents the baseline characteristics of the included patients.Among the 12-month cohort, the mean age was 52.8 ± 18.5 years; the male gender accounted for 54.3% of the population; the mean tumor size measured 82.8 ± 55.9 mm; surgery was performed in 85.3% of the patients, whereas 14.5% and 8.1% received radiotherapy and chemotherapy, respectively; metastatic events were documented in 0.8% of patients involving bone and 3.0% affecting the brain, liver, or lung.Survival probabilities at 12, 24, 60, and 120 months were recorded as 87.3%, 80.0%, 67.1%, and 48.4%, respectively (as depicted in Figure 3).

| Development and validation of the decision tree models
Leveraging the C5.0 algorithm, we pinpointed the most influential prognostic indicators, which encompassed tumor histology, surgery procedure, age, visceral metastasis to brain/liver/lung, chemotherapy, tumor grade, and sex.The rankings of importance of the predictors for the four cohorts are illustrated in Figure 4.These prognostic markers served as the foundation for constructing decision tree models designed to forecast survival status at 12, 24, 60, and 120 months.The visual representations of the decision trees are presented in Figures 5-8.The depth of the decision tree was limited within five layers.In the 12-month model, surgery procedure, age, histology type, visceral metastasis to brain/liver/lung, and chemotherapy emerged as key determinants for prediction; in the 24month model, chemotherapy, surgery procedure, age, and histology type were included; in the 60-month model, chemotherapy, age, surgery procedure, and tumor grade were included; in the 120-month model, age, chemotherapy, tumor grade, and sex were included.These graphical models empower clinicians to swiftly ascertain predicted survival rates at varying time intervals.
Figure 9 elucidates the confusion scatter diagrams that map the congruence between predicted and actual survival statuses at 12 (Figure 9A), 24 (Figure 9B), 60 (Figure 9C), and 120 (Figure 9D) months, with both training and testing datasets scrutinized.The plots exhibit a high degree of alignment between the predicted and actual survival statuses, as the majority of data points cluster closely along the diagonal line.The accuracy rates were 89.08%, 86.88%, 82.40%, and 82.86% for training group, and 88.74%, 85.13%, 82.16%, and 85.73% for test group at 12, 24, 60, and 120 months, respectively.Furthermore, Figure 10 provides the ROC curves for both training and testing datasets at 12 (Figure 10A), 24 (Figure 10B), 60 (Figure 10C), and 120 (Figure 10D) months.The AUC values were indicative of favorable discriminatory power of the models: 0.894, 0.834, 0.806, and 0.820 for the training set, and 0.882, 0.865, 0.808, and 0.831 for the testing set, underscoring the robustness and reliability of our predictive models.

F I G U R E 1
The flow chart of the study design.ACC, accuracy rate; AUC, area under ROC curve; CHS, chondrosarcoma; ROC, receiver operator characteristic; RPA, recursive partitioning analysis.

F I G U R E 2
The screenshot captures the comprehensive analysis flow within SPSS Modeler, detailing each step from initial data loading through to final model validation.Data undergoes rigorous screening and classification before being subjected to the C5.0 algorithm, which constructs dichotomous decision trees tailored to predict survival at 12, 24, 60, and 120 months.

| Interactive application and web-based platform
To streamline the intricate manual computations associated with graphical decision tree models, we have enhanced our model by developing an interactive application.The application features five distinct tabs, each designed with a specific purpose: an overview of the application, detailed descriptions of the demographic characteristics of the study population, the process of constructing the model, an assessment of the performance, and an intuitive interface for predicting survival probabilities (Figure 11).Users can customize their review of the study population's demographics, scrutinize the model creation and validation processes, and interactively automate survival probability predictions across four predefined time points.When forecasting survival probabilities, users simply select the prognostic factors to receive bar and pie charts illustrating the distribution of survival statuses at four distinct time points.In an effort to promote widespread utilization and acceptance of this predictive tool, we have made it readily accessible via a web server (URL: https:// yangx g1209.shiny apps.io/ chond rosar coma_ survi val_ predi ction/ ).For instance, a male patient, aged 60, diagnosed with dedifferentiated (grade IV: undifferentiated) chondrosarcoma and exhibiting visceral metastasis to the brain/ liver/lung, who has undergone surgical intervention and chemotherapy, is projected to have survival probabilities of 62.86%, 56.13%, 28.81%, and 26.14% at 12, 24, 60, and 120 months.T A B L E 1 (Continued)
Our study has successfully identified critical prognostic indicators including tumor histology, surgery procedure, age, visceral (brain/liver/lung) metastasis, chemotherapy, tumor grade, and sex that significantly influence survival outcomes of patients with chondrosarcoma.The development of decision tree models leveraging the C5.0 algorithm has yielded a promising prognostic tool for predicting overall survival.Recognizing the practical challenges of implementing graphical models in routine practice, we have taken a significant step forward by developing an interactive application and web-based platform.This innovation simplifies the process of survival prediction, enabling clinicians to input patient-specific information and receive immediate, visual feedback on survival probabilities.F I G U R E 1 0 Receiver Operating Characteristic (ROC) curves illustrate the performance of the decision tree models at distinct time points: 12 (A), 24 (B), 60 (C), and 120 months (D), with separate analyses conducted for both the training and test datasets.The area under the ROC curve (AUC), a metric quantifying the discriminative ability of the models, was found to be 0.894, 0.834, 0.806, and 0.820 for the training set, and 0.882, 0.865, 0.808, and 0.834 for the test set, respectively.These figures reflect the robustness and reliability of the models in predicting outcomes at various follow-up durations.

| Prognostic indicators for the survival statuses at various time points
The identification of pivotal prognostic indicators highlights the multifaceted nature of chondrosarcoma survival rates.6]18,19,[25][26][27] The histological type and tumor grade of a chondrosarcoma reflect its biological behavior.14]25 Typically, for low-grade chondrosarcomas, effective and safe treatment can be achieved through lesion curettage, while high-grade tumors require more extensive surgical resection or even amputation. 12,13Generally, patients who underwent complete surgical excision are in better overall performance status than those who did not receive surgery, resulting in higher overall survival rates. 14,18,19The age at diagnosis has also been confirmed as an independent prognostic factor for chondrosarcoma survival time in numerous studies, [14][15][16]18,26 which may be due to older patients having poorer overall health and being more susceptible to becoming frail after developing the disease, as well as being unable to tolerate more extensive treatments. Patiets with concurrent visceral (brain, Interactive survival prediction software for chondrosarcoma patients.The software interface consists of five dedicated tab pages: "Introduction," "Datasets," "Model," "Performance," and "Survival Prediction."Within the "Datasets" page, users are afforded the opportunity to review and perform automated, visual data analysis on the datasets pertinent to the modeling process (A).Progressing to the "Survival Prediction" page, once patient clinical characteristics have been specified, the software enables the automatic generation of projected survival probabilities at four predetermined time intervals (B).
liver, or lung) metastases indicate more advanced tumor progression, causing damage to other vital organs and negatively impacting overall treatment effectiveness, thus reducing their overall survival rate. 27Perioperative chemotherapy can achieve higher local control and reduce longterm metastasis risks for certain higher-grade tumors that are sensitive to chemotherapy, thereby extending survival times. 15,18,19Our study found that among Grade I chondrosarcoma patients aged between 65 and 76 years, female gender exerted a protective effect on their 120-month survival status compared to males (Figure 8).We hypothesize that in this age group, women have relatively lower estrogen levels and correspondingly lower bone density, while lower-grade tumors consist primarily of extracellular cartilaginous matrix with fewer cellular components, thus hindering the growth of tumor tissue.The impact of gender on the overall survival of patients with chondrosarcoma has also been corroborated in several prior studies. 18,19,26he temporal dynamics of prognostic factors in our study reveal a complex interplay between various clinical characteristics and their impact on survival outcomes at different stages of the disease.This finding challenges the conventional approach of assessing prognostic factors solely based on their overall effect across the entire follow-up period.Our results suggest that the clinical utility of these factors may vary significantly depending on the temporal context in which they are considered.For instance, the identification of histological type and presence of visceral metastasis as potent predictors of short-term (up to 24 months) survival underscores the immediate clinical relevance of these factors in the early management of patients.Clinicians may leverage this knowledge to prioritize interventions aimed at mitigating the acute risks associated with these conditions.Conversely, the emergence of tumor grade and sex as critical determinants of long-term (≥60 months) survival highlights the need for sustained surveillance and tailored treatment strategies that account for the chronic implications of these factors.Moreover, the consistent influence of age at diagnosis and receipt of chemotherapy across all follow-up intervals suggests that these variables play a foundational role in shaping the overall survival trajectory.Their ubiquitous impact emphasizes the importance of incorporating these factors into baseline risk assessments and treatment planning, regardless of the temporal focus of care.
These observations not only refine our understanding of the prognostic landscape but also have practical implications for personalized medicine.By recognizing the shifting significance of prognostic factors over time, healthcare providers can adapt their prognostic models and therapeutic approaches to reflect the changing needs of patients as their disease progresses.This adaptive approach stands in contrast to static prognostic models that fail to capture the dynamic nature of disease and patient responses, thereby potentially limiting the precision and effectiveness of clinical decision-making.Therefore, our study contributes to the evolving paradigm of prognostication by illuminating the temporal specificity of prognostic factors.This nuanced perspective enhances the granularity of prognostic models and supports a more responsive and individualized approach to patient care, ultimately aiming to optimize survival outcomes and quality of life throughout the disease continuum.

| Predictive systems established for survival estimation
Upon comprehensively and precisely screening for prognostic factors influencing the survival rates of chondrosarcoma, there is a need to further develop predictive systems that integrate all independent predictive elements for a more convenient estimation of patients' overall survival probabilities.Initially, clinicians typically utilize the TNM system, which assesses patients' initial survival status and risk stratification based on three key components: tumor characteristics (T), lymph nodes (N), and distant metastasis (M). 28Additionally, the AJCC staging system has been widely used for the prognostic evaluation of malignant tumors for a long time. 28,29owever, an increasing number of studies acknowledge the notable limitations of the AJCC system, as it only considers tumor size and histological metastasis. 30,3132,33 The modeling approaches employed in these studies often include Cox proportional hazards regression or logistic regression analyses, constructing the models as graphical prediction systems.Wu et al. 32 established a nomogram model for predicting the overall survival of patients with limb chondrosarcomas, based on uni-and multi-variable Cox regression.They identified that age, site, grade, tumor size, histology, stage, and use of surgery, radiotherapy, and chemotherapy are significantly associated with overall survival.Similarly, Sun et al. 16 developed a nomogram to predict overall survival in patients with high-grade chondrosarcoma, based on uni-and multivariate Cox analyses.In this study, age, histology type, tumor size, AJCC stage, regional expansion, and surgery were identified as independent prognostic factors.][36][37] During the establishment of these models, however, all patients are analyzed as a single cohort, potentially obscuring unique and sensitive prognostic factors within specific subgroups of the cohort, making them difficult to uncover during the analytical process.In contrast, the decision tree model developed in this study utilizes the C5.0 algorithm and RPA, continuously segmenting the cohort based on the most significant predictive factor at each step.Throughout the development of the tree, the variables at each branch point are maintained as the most significant predictive factors, which facilitates the discovery of deeper interactions between variables and enhances the diagnostic accuracy of the model. 38,39Furthermore, when users employ these previously established predictive systems for risk assessment, they must manually score each predictive factor and calculate the total score, which is then converted into corresponding survival probabilities for different time periods.This process remains somewhat complex, hindering the promotion of the model.To address the complexity and challenges associated with using the model, we have encapsulated the established predictive model into an app and deployed it on the server at shinyapps.io in this study.This app allows users to review the original datasets included in the analysis visually and customize explorations of the patient's epidemiological characteristics as needed.It also provides access to the entire process of model construction and validation.Most importantly, after selecting the clinical features of the patient to be predicted, the app automatically outputs the overall survival probabilities for 1, 2, 5, and 10 years, along with visual representations.
The findings of our study represent a pivotal stepping stone towards the development of a transformative tool that could redefine the landscape of clinical decisionmaking and patient engagement.As we contemplate the translation of our research into a clinically applicable interactive application and web-based platform, several critical considerations emerge.Firstly, the application must be designed with the end-user in mind, ensuring that it is intuitive, informative, and empowering.Clinicians require a tool that simplifies complex data into actionable insights, thereby optimizing their decision-making process.Patients, on the other hand, stand to benefit from a resource that elucidates the intricacies of their treatment options, fostering a greater understanding and encouraging active participation in their healthcare journey.
Key features of the envisioned application include personalized treatment protocols, interactive educational modules, clinical decision support tools, user feedback mechanisms, and stringent data security measures.These elements collectively aim to create a user-centric platform that integrates seamlessly with existing clinical workflows and adheres to the highest standards of patient confidentiality.However, the path from conceptualization to implementation is fraught with challenges.It necessitates a multidisciplinary approach, engaging stakeholders from diverse backgrounds-researchers, clinicians, software developers, patient advocates, and legal experts-to ensure a holistic and effective solution.Pilot testing and phased rollouts are essential to evaluate the application's performance in real-world scenarios, allowing for iterative refinements based on user feedback and empirical data.

| Limitations
This study presents several limitations.Firstly, as a retrospective study, it is subject to potential biases in the data obtained, which may affect the validity of the results.To further verify the findings of this study, prospective clinical research is necessary.Secondly, there are inherent limitations associated with the SEER database itself, which may impact the generalizability of the conclusions of the study.The challenge of missing data is a wellrecognized issue that has significant implications for the robustness and interpretability of survival analyses.While traditional approaches to handling missingness, such as complete case analysis or imputation techniques, offer potential solutions, they each come with their own set of limitations.Complete case analysis, for instance, can lead to biased results if the missingness is nonrandom, and it reduces the sample size, potentially limiting the power of the study.On the other hand, imputation methods, especially those that rely on predictive models, may introduce spurious correlations and inflate the precision of estimates if the missing data are extensive.In light of these considerations, our study took another approach by incorporating the "unknown" category directly into the model framework.This strategy acknowledges the uncertainty associated with missing prognostic factors and avoids the pitfalls of assuming complete knowledge where none exists.By doing so, we aim to preserve the integrity of the dataset and provide a more realistic representation of the clinical setting, where prognostic information is not always fully available.However, the inclusion of an "unknown" category does present interpretational complexities.In the context of clinical decision-making, the "unknown" category serves as a reminder of the need for prudence and the reliance on available evidence.Lastly, the survival outcome variable (survival time and status) was transformed into binary variables representing survival status at 12, 24, 60, and 120 months.Unlike the Cox model, the decision tree model used in this study does not handle censoring data as effectively.Consequently, censored data significantly influenced the accuracy of the results.To mitigate this issue, the censored records at the four time points were excluded from the analysis beforehand, leading to varying numbers of patients across the corresponding cohorts at each time point.
In conclusion, this investigation has robustly pinpointed pivotal prognostic indicators that markedly impact the survival prospects of individuals with chondrosarcoma.Through the amalgamation of these essential prognostic variables, sophisticated decision tree algorithms have been constructed, tailoring the estimation of survival to the individual level-a tool likely to be of immense value in shaping therapeutic plans and patient consultations.The empirical validation of these models confirms their high fidelity, evidenced by the close alignment between forecasted and observed survival outcomes, underscoring their potent predictive capabilities.To enhance user accessibility, we have introduced an interactive online application, streamlining the survival prediction process.This innovation empowers healthcare professionals to promptly enter patient-specific data and receive instant, graphical representations of survival likelihoods, thereby optimizing the decision-making experience.

F I G U R E 4 F I G U R E 5 F I G U R E 9
The hierarchy of significance among predictors within the decision tree framework is elucidated at distinct time intervals: 12 (A), 24 (B), 60 (C), and 120 months (D).This ranking reflects the relative contribution of each variable in determining survival probability at varying stages.Decision tree model for 12-month survival status.CMT, chemotherapy; CHS, chondrosarcoma.F I G U R E Decision tree model for 24-month survival status.CHS, chondrosarcoma; CMT, chemotherapy.F I G U R E Decision tree model for 60-month survival status.CMT, chemotherapy.F I G U R E Decision tree model for 120-month survival status.CMT, chemotherapy.Confusion scatter diagrams elucidate the agreement between predicted and observed survival outcomes at 12 (A), 24 (B), 60 (C), and 120 months (D) for both the training and testing groups.Each model exhibited commendable concordance between the predicted and actual survival states.The accuracy rate (ACC) serves as an indicator of the overall predictive performance, confirming the efficacy of the models across different time points.

2.3 | Collected variables and data processing
Baseline characteristics of the included patients for four different time points.
T A B L E 1